QBSUM: A large-scale query-based document summarization dataset from real-world applications

نویسندگان

چکیده

Query-based document summarization aims to extract or generate a summary of which directly answers is relevant the search query. It an important technique that can be beneficial variety applications such as engines, document-level machine reading comprehension, and chatbots. Currently, datasets designed for query-based are short in numbers existing also limited both scale quality. Moreover, best our knowledge, there no publicly available dataset Chinese summarization. In this paper, we present QBSUM, high-quality large-scale consisting 49,000+ data samples task We propose multiple unsupervised supervised solutions demonstrate their high-speed inference superior performance via offline experiments online A/B tests. The QBSUM released order facilitate future advancement research field.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

RDF Keyword-based Query Technology Meets a Real-World Dataset

This paper presents the results of an industrial project, conducted by the TecGraf Institute and Petrobras (the Brazilian Petroleum Company), to develop a tool to facilitate access to a large database, with hydrocarbon exploration data, by combining RDF technology with keyword search. The tool features an algorithm to translate a keyword query into a SPARQL query such that each result of the SP...

متن کامل

Query-Based Summarization Based on Document Graphs

Text summarization is an important problem, which has numerous applications. This problem has been extensively studied and many approaches have been proposed in the literature for its solution. One of the most challenging problems in the field of text summarization is generating a user-focused summary based on a query. In this paper, we investigate a new approach that tackles this problem and p...

متن کامل

Evaluation Challenges in Large-Scale Document Summarization

We present a large-scale meta evaluation of eight evaluation measures for both single-document and multi-document summarizers. To this end we built a corpus consisting of (a) 100 Million automatic summaries using six summarizers and baselines at ten summary lengths in both English and Chinese, (b) more than 10,000 manual abstracts and extracts, and (c) 200 Million automatic document and summary...

متن کامل

CLASSY Query-Based Multi-Document Summarization

Our summarizer is based on an HMM (Hidden Markov Model) for sentence selection within a document and a pivoted QR algorithm to generate a multi-document summary. Each year, since we began participating in DUC in 2001, we have modified the features used by the HMM and have added linguistic capabilities in order to improve the summaries we generate. Our system, called “CLASSY” (Clustering, Lingui...

متن کامل

MultiSum: Query-Based Multi-Document Summarization

This paper describes a generic, opendomain multi-document summarisation system which combines new and existing techniques in a novel way. The system is capable of automatically identifying query-related online documents and compiling a report from the most useful sources, whilst presenting the result in such a way as to make it easy for the researcher to look up the information in its original ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Computer Speech & Language

سال: 2021

ISSN: ['1095-8363', '0885-2308']

DOI: https://doi.org/10.1016/j.csl.2020.101166